Making the World Safe for Evidence-Based Policy: Let's Slay the Biases in Research on Value-Based Insurance Design

Ask a teenage boy the object of the on-line game Dragon Fable, the topic of the promotional description that appears above, and you will endure several moments of eye rolling intended to convey your appalling ignorance of cyberspace. Eventually you will get an answer. The point of slaying the dragon, you will be told, is to save the world. Saving the world is a tall order. We compromise on trying to make health care a little more evidence based—that is, to provide readers with the highest possible quality information about topics important to the practice of managed care pharmacy. We strive to ensure complete disclosure of all factors that potentially affect research results, including both study sponsorship and methodological details, and thereby allow readers to make their own judgments about the applicability and validity of the work. We encourage others to do the same. Thus, we were dismayed by the eager and sometimes misinformed responses to a recently published study conducted by Chernew and colleagues and reported by the University of Michigan’s Center for Value-Based Insurance Design (VBID). The study linked large copayment reductions (from $5/$25/$45 to $0/$12.50/$22.50 for generic drugs, preferred brand drugs, and non-preferred brand drugs, respectively) with improvements of 2-4 percentage points in medication possession ratios (MPR) for 4 of 5 chronic medication classes.2 Popular press coverage of the study reported that because poor adherence threatens patient health and increases overall health care costs, employers should reduce out-of-pocket (OOP) costs to increase use of preventive medications and improve both medical and economic outcomes. For example, a Reuters’ report contained a subheading stating that the Chernew et al. study “highlights [a] large employer’s efforts to develop a healthier workforce by removing barriers to treatment,” and declared that “good health is good business.”3 A headline in another publication, reported in HealthDay News and picked up by the National Institutes of Health’s Medline Plus Web site, touted the health benefits of reduced OOP cost sharing: “Lowering co-pays on some drugs help [sic] fight chronic diseases.” 4 Because the Chernew et al. study included no examination of health outcomes or costs of any kind, one can reasonably question the origin of these promises that had no apparent basis in study findings. The likely source of the enthusiasm lies both in the study report itself and in the accompanying press release from the Center for VBID at the University of Michigan, combined with the penchant in our culture to seek magic bullets for societal problems. In the study report, the authors predicted that because of clinical evidence supporting medication adherence, they “expect health improvements, although we do not quantify them in this study.”2 The authors’ reported expectations for copayment reductions extended to include possible “gains in worker productivity or reduced absenteeism or disability.” Notably, no evidence linking copayment reductions to any of these outcomes was either cited by the authors or extant in the study results. The University of Michigan’s press release went even further, claiming that study results suggested a “pay off in the long run, in the form of fewer hospitalizations or emergency room visits,”5 although the study included no measure of inpatient hospital or emergency room use. What evidence actually supports this enthusiasm? Is VBID, as its proponents would claim, a sharp new empirically-supported weapon in the battle to provide cost-effective and high-quality health care? Or is it a Trojan Horse, whose contents may substitute for investments that have a higher rate of success in creating value for money? In answering that question, the knight who wields weapons to slay biases is aware that patient cost sharing is a high-stakes battle, potentially affecting not only individual patient outcomes but also the sustainability of the insurance system in the future. As Curtiss pointed out 4 years ago in similar circumstances, following much “sky-is-falling” media attention to a drug company-sponsored report critical of prescription drug cost-sharing increases, sweeping conclusions are belied by pesky details. Serious problems in the details of that study report, ignored by popular press coverage at the time, included an inequivalent comparison group, selective reporting of findings, and apparent methodological errors including failure to consider angiotensin II receptor blockers (ARBs) as alternative therapy for angiotensin-converting enzyme (ACE) inhibitors.6 Because technical details are crucial to interpretation of research and policy recommendations, we encourage readers to step away from the press coverage and look at the most recent VBID report in a scientific manner. Doing so quickly reveals more about what the “study” did not show than what it did show.

Even as you read this, new monsters are being made, new items crafted, and dungeons overflowing with horrible surprises are being designed to test your bravery and intellect. 1 A sk a teenage boy the object of the on-line game Dragon Fable, the topic of the promotional description that appears above, and you will endure several moments of eye rolling intended to convey your appalling ignorance of cyberspace. Eventually you will get an answer. The point of slaying the dragon, you will be told, is to save the world.
Saving the world is a tall order. We compromise on trying to make health care a little more evidence based-that is, to provide readers with the highest possible quality information about topics important to the practice of managed care pharmacy. We strive to ensure complete disclosure of all factors that potentially affect research results, including both study sponsorship and methodological details, and thereby allow readers to make their own judgments about the applicability and validity of the work. We encourage others to do the same.
Thus, we were dismayed by the eager and sometimes misinformed responses to a recently published study conducted by Chernew and colleagues and reported by the University of Michigan's Center for Value-Based Insurance Design (VBID). The study linked large copayment reductions (from $5/$25/$45 to $0/$12.50/$22.50 for generic drugs, preferred brand drugs, and non-preferred brand drugs, respectively) with improvements of 2-4 percentage points in medication possession ratios (MPR) for 4 of 5 chronic medication classes. 2 Popular press coverage of the study reported that because poor adherence threatens patient health and increases overall health care costs, employers should reduce out-of-pocket (OOP) costs to increase use of preventive medications and improve both medical and economic outcomes. For example, a Reuters' report contained a subheading stating that the Chernew et al. study "highlights [a] large employer's efforts to develop a healthier workforce by removing barriers to treatment," and declared that "good health is good business." 3 A headline in another publication, reported in HealthDay News and picked up by the National Institutes of Health's Medline Plus Web site, touted the health benefits of reduced OOP cost sharing: "Lowering co-pays on some drugs help [sic] fight chronic diseases." 4 Because the Chernew et al. study included no examination of health outcomes or costs of any kind, one can reasonably question the origin of these promises that had no apparent basis in study findings. The likely source of the enthusiasm lies both in the study report itself and in the accompanying press release from the Center for VBID at the University of Michigan, combined with the penchant in our culture to seek magic bullets for societal problems. In the study report, the authors predicted that because of clinical evidence supporting medication adherence, they "expect health improvements, although we do not quantify them in this study." 2 The authors' reported expectations for copayment reductions extended to include possible "gains in worker productivity or reduced absenteeism or disability." Notably, no evidence linking copayment reductions to any of these outcomes was either cited by the authors or extant in the study results. The University of Michigan's press release went even further, claiming that study results suggested a "pay off in the long run, in the form of fewer hospitalizations or emergency room visits," 5 although the study included no measure of inpatient hospital or emergency room use.
What evidence actually supports this enthusiasm? Is VBID, as its proponents would claim, a sharp new empirically-supported weapon in the battle to provide cost-effective and high-quality health care? Or is it a Trojan Horse, whose contents may substitute for investments that have a higher rate of success in creating value for money? In answering that question, the knight who wields weapons to slay biases is aware that patient cost sharing is a high-stakes battle, potentially affecting not only individual patient outcomes but also the sustainability of the insurance system in the future. As Curtiss pointed out 4 years ago in similar circumstances, following much "sky-is-falling" media attention to a drug company-sponsored report critical of prescription drug cost-sharing increases, sweeping conclusions are belied by pesky details. Serious problems in the details of that study report, ignored by popular press coverage at the time, included an inequivalent comparison group, selective reporting of findings, and apparent methodological errors including failure to consider angiotensin II receptor blockers (ARBs) as alternative therapy for angiotensin-converting enzyme (ACE) inhibitors. 6 Because technical details are crucial to interpretation of research and policy recommendations, we encourage readers to step away from the press coverage and look at the most recent VBID report in a scientific manner. Doing so quickly reveals more about what the "study" did not show than what it did show. compared with those of Employer B, which left its cost-sharing structure unchanged. While the analytic approach of a differencein-difference calculation is generally considered to be strong, study design alone does not ensure methodological rigor. A key question in any analysis that purports to be controlled is whether the study groups would reasonably be expected to respond to the same intervention in similar ways. The point of a controlled analysis is to introduce a disturbance or change into the experience of an intervention group, while leaving the experience of a comparison group unchanged. However, if the intervention and comparison groups would have responded differently to the same intervention because of confounding factors (baseline differences between the groups), the logic of the comparative design falls apart. [7][8][9] For this reason, it is critical (and a JMCP standard) that readers be given basic descriptive information about the groups in study reports so that the comparability of the groups may be assessed.
In pharmacoeconomic analyses of a benefit design innovation, study reports typically include descriptive information (means, medians, and some measure of variation) about per member per month (PMPM) prescription drug claims and dollars, generic dispensing ratio, member participation rate, and a summary measure of chronic disease burden such as the chronic disease score. [10][11][12][13][14] Total prescription claims count is a particularly useful measure, predicting future health care expenditures more accurately than did several commonly used comorbidity scales in a 2006 study of antihypertensive medication users. 15 Additional important baseline information includes plan design features (e.g., preferred provider organization vs. health maintenance organization, fee-for-service vs. provider-risk arrangements), geographic region (shown to be a key prescription drug utilization factor in previous reports), 16 industry or sector (e.g., white collar vs. blue collar) especially if just 2 employers are being compared (i.e., as opposed to comparing study groups comprising numerous small employers), and other relevant pharmacy benefit design features (e.g., step therapy, mail order option).
Because all this information is absent from the Chernew et al. report, there is no way to know if the study groups were at all comparable, and the small amount of information presented suggests that they were not. The authors do not describe specific copayment amounts for the comparison group but instead report average cost share per prescription, a detail suggesting that perhaps the comparison group did not have a copayment structure at all but was instead operating under a coinsurance design. Additionally, beneficiaries in the comparison group were older by a mean of 7 years, 45 years versus a mean of 38 years in 2005 for the beneficiaries in the intervention group. The authors statistically controlled for age, gender, previous use of the study medication (binary indicator), and selected comorbidities (identified using an unexplained algorithm) in multivariate analysis; however, in benefit design research, differences in measured factors often signal differences in unmeasured (and therefore uncontrolled) factors. For this reason, statistical controls for measured factors are no substitute for basic comparability of study groups. 9,17 Unfortunately, Chernew et al. also fail to report basic accepted measures of the quality of multivariate analyses (e.g., percent of variance explained) that would have provided quantitative information about the degree to which statis tical analysis controlled for confounding factors affecting the comparison.
There is also a question regarding the external validity (generalizability) of the Chernew et al. study findings. While the report provides only scant information about the 2 study groups, the little that can be derived suggests that these groups are not typical of large employers. For example, the proportion of non-employee beneficiaries is very small, with an average bene ficiary-to-employee ratio of only 1.39 in the intervention group and 1.52 in the comparison group. Large employers typically have beneficiary-to-employee ratios in the range of 2.00 to 2.50, and the national average in 2005 was 2.26. 18

What was the intervention exactly?
One does not need to be a scientist to arrive at more questions than answers when reading the 2008 report by Chernew et al. An obvious omission is the absence of a description of the comparison group's benefit design. For example, the "copayment rate" for "brand-name drugs" is reported in the text as $29.72 in the comparison group in 2004; the reader must assume that this "rate" refers to actual average OOP cost per prescription. The reader is not informed anywhere in the article if this (average) "copayment" amount in the comparison group is the result of a fixed 2-tier or 3-tier copayment design, a coinsurance design, or some combination. The bewilderment continues when the reader tries to interpret the description of the increase in (average) brand-drug "copayment" in the comparison group by "about $1 per prescription" from 2004 to 2005. The "about 4 percent" increase occurred in the comparison plan that supposedly did not change copayments in 2005. There are at least 2 possible explanations for this increase in OOP cost per claim for a pharmacy benefit design that did not change: (1) the comparison group had a coinsurance component in its benefit design, or (2) there was an increase in the average days supply per pharmacy claim. However, no measure of days supply is reported by the authors, nor is there mention of a mail-order option anywhere in the article.
Confusion rises when the reader attempts to find (or even calculate) fundamental cost measures necessary to understand the intervention's impact on patients, such as overall OOP cost per claim. The reader is told that "weighted average copay rates (brand and generic) fell in the intervention firm by 33.9 percent" but is never told what overall average amounts were actually paid by patients in either period. Because generic dispensing ratios are not reported for either the pre-intervention or post-intervention periods, it is impossible to calculate the overall OOP cost for either study group.
Nor is the reader told what percentage of brand drug use in each study group is attributable to preferred versus non-preferred medications. Applying the intervention group's pre-intervention average cost share for brand medications ($28.55) to the reported copayment design for brand medications ($25 for preferred, $45 for non-preferred) yields a reasonable guess at the preferred brand drug use rate (i.e., 82.3% at $25, and 17.7% at $45, approximates the reported $28.55 OOP average per claim). However, for the comparison group, not even a guess at the preferred brand drug use rate can be made, because of the absence of benefit design detail. Equally important, the reader is similarly left to guess about the content and breadth of the drug formularies in the 2 study groups; in other words, (1) were the drug formularies broad in the preferred drug list or more restrictive, (2) did the comparison group have a drug formulary comparable to that of the intervention group, and (3) was either drug formulary representative of typical commercial health plans?
Reader bewilderment escalates to frustration when trying to decipher how an intervention group generic copayment of $5.00 in the baseline period (2004) can be considered the same as $16.22 for the comparison group. Not only are these cost-share amounts very different in magnitude, the $16.22 (average) generic drug copayment in the comparison group is atypical. The average generic drug copayment was $9.14 for 404 employers in one survey 19 and $10 overall in 2004 and 2005 for the more than 2,000 employers in the national survey sponsored by the Kaiser Family Foundation. 20 The frustration intensifies as the reader is led through a description of the drop in "copayments" for generic drugs in 2005 for the intervention group by "about 70 percent (more than $3)" when the change had been described 2 pages earlier as a reduction "from $5 to zero." The math, which must be performed by the reader, shows a copayment change from $5 to approximately $1.50.

Were the intervention outcomes clinically significant?
The Chernew et al. study found that, expressed in MPR percentage points, the effect sizes of the copayment reductions were 2.59 for ACE inhibitors and ARBs; 3.02 for beta-blockers, 4.02 for diabetes drugs, and 3.39 for statins. Amazingly, the authors report sample sizes for only 1 of the 5 classes (diabetes drugs) assessed in the study, but with plan sizes of approximately 38,000 beneficiaries in the intervention group and 70,000 in the comparison group in 2005, it is probable that the sample sizes are well over several thousand for the analyses of ACE inhibitors/ARBs, beta-blockers, and statins, classes for which national prevalence rates are approximately 11%. 21 Even assuming somewhat lower prevalence rates to account for the relatively young age of the study groups, the expected sample sizes are so large that even small differences in MPR will be statistically significant (i.e., unlikely to be due to chance). Whether these differences are clinically or practically significant enough to affect patient outcomes is another matter entirely. The knight looking for a few good weapons in the cost-sharing battle should ask whether a difference of 2-4 percentage points in MPR-representing an increase of just 7-14 days of treatment over the course of an entire year-will actually produce any clinical benefits.
The authors found no effect of the intervention on MPR for inhaled corticosteroids. This could be an important take-away point from this study, that lowering copayments for inhaled corticosteroids is not likely to improve MPR. However, this finding is just another example of the many questions raised by this report. No change in the MPR for inhaled corticosteroids in the intervention group is quite likely the result of the unreliable nature of the days supply field on pharmacy claims for inhalers and liquid dose forms. 22,23 MPR is of course derived from the days supply field, and the baseline MPR of 31.6 gives the reader a clue about the unreliability of MPR in this research application.

How much did the intervention cost?
A particularly glaring omission in the Chernew et al. report is the lack of a cost-benefit calculation for the intervention. Applying utilization rates and costs from national data 21,24 to the actual mean OOP cost reductions presented in the Chernew et al. report yields estimates of the PMPY intervention costs for ACE inhibitors/ARBs, diabetes drugs, and statins: $7.54, $6.14, and $12.60, respectively (Table 1). However, Chernew et al. report that the intervention group's actual OOP cost reductions in 2005 were less than expected (e.g., brand OOP cost declined by only 29.9% even though the brand copayment rates were cut in half) because of plan phase-in. Assuming the full copayment reduction amount (after phase-in) and applying the same national data, the respective PMPY intervention costs for ACE inhibitors/ARBs, diabetes drugs, and statins would be approximately $11.53, $9.10, and $18.60. In a plan with 100,000 lives, the approximate annual cost increase for those 3 classes combined would range from $2.6 to $3.9 million. For statins, the increase in annual cost per patient would be $114-$168 ($1.26 million to $1.86 million for about 11,100 patients) to add 12.4 days of statin therapy per year (MPR percentage point change of 3.39 X 365 days). Chernew et al. appear to argue for the possibility of medical cost offsets attributable to increased adherence; however, controlled studies have documented no change in medical service use following implementation of copayment increases in commercially insured populations. 25

Did enrollee choice of plan bias the analysis?
Providing the reader with information about the cases removed at each stage of sample selection is another basic research standard (and a JMCP requirement). This information, also missing from the Chernew et al. report, is especially important in cost-sharing research. Employees who are married or have a domestic partner benefit option often choose between 2 or more available health benefit packages. An apparent increase in utilization of chronic medications could represent nothing more than a channeling bias effect; for example, during an open enrollment process, employees who know that they or a family member will use chronic medication in the coming year are more likely to select the plan in which the chronic medication is free or available for a low OOP cost. Quantitative analysis of those excluded from the post-intervention analysis due to disenrollment would have shed light on this possibility but was unfortunately not included in the Chernew et al. report. Notably, the intervention group's size increased by 5.8% in 2005, while the comparison group's size decreased by 5.5%. 2

Who sponsored the study?
Careful examination of the history of cost-sharing research over the past 20 years reveals a strong relationship between pharmaceutical manufacturer sponsorship and undesirable findings for prescription drug cost sharing. 25 While sponsorship alone does not negate the findings of rigorous and transparent work, the co-sponsorship of the Chernew et al. study by 2 pharmaceutical manufacturers should give special pause to the reader concerned about the study report's lack of numerous critical details about methodology, study group characteristics, and findings. The study report's lack of transparency should raise more than eyebrows, and readers should view with similar skepticism the assertions of Mark Fendrick, the study's seventh author and a VBID consultant, that "all research to this point has shown individuals will not buy important medical services even if there's a small financial barrier: $5 or even $2," 5 particularly since controlled studies of cost sharing in commercially insured populations have shown exactly the opposite. 25

Basic credibility: Does the study pass the "smell test?"
More than 50 years ago, the late Darrell Huff's now classic volume, How to Lie with Statistics, highlighted the dangers of the "gee-whiz graph." 26 Huff pointed out that merely truncating any graph's ordinate (vertical or Y axis) to remove part of the scale produces the impression that a trend is bigger than it actually is: "Of course, the eye doesn't 'understand' what isn't there, and a small rise has become, visually a big one." To anyone familiar with Huff's work-or at least with graphing technique-a look at Chernew et al.'s Exhibit 2 is disheartening; the first 54 percentage points of the graph, representing more than one half of the MPR scale of 0% to 100%, have been removed from the ordinate scale, giving the erroneous visual impression of dramatic MPR changes.
But there is an even more compelling reason to doubt the presentation of Exhibit 2 in Chernew et al. Rather than showing the X (horizontal) axis on a continuous calendar scale, the authors opted to overlay the lines depicting the preintervention and post-intervention results, thereby making it more difficult for the reader to see trends over time. The true relationship of the intervention to the comparison group becomes more obvious when the data points are shown using a standard graphing technique on a continuous calendar scale (see Figure). First, the MPR for the intervention group never rises to the level of the comparison group. Second, a seasonal trend is evident in both groups, with no apparent change in that pattern following the intervention. Thus, the corrected graph suggests that the cost-sharing decreases had little discernible effect on MPR. Similar doubt is in store for readers who are familiar with the cost-sharing literature when they read Chernew et al.'s statement that "the magnitude of the adherence-improving effect with copay reduction is similar to those estimated in the existing literature" and examine the study cited as evidence for this point.

Dragon Slaying with Insufficient Information: You Can't Fire at What You Can't See
If a reader is experienced, reasonably knowledgeable, and willing to spend a lot of time studying the Chernew et al. report Authors and doing algebraic manipulations to compensate for missing mathematical detail, the flaws in the "evidence" that it provides become obvious. However, a study report so incomplete that readers need years of experience and hours of time to decipher it should be considered unacceptable. What of those who are less familiar with the literature, such as employers and commercial insurance payers or consultants to decision makers? There is danger in swallowing the VBID headlines without a reasonable pounding of the underlying lack of evidence (Table 2). Still, there may be hope for the integrity and longevity of pharmacy benefit plans since plan sponsors may not be so gullible; survey research conducted between July 2005 and March 2006 for 609 of the largest employers in 41 U.S. markets showed that purchasers are not quick to implement "value-based incentives" for employees and beneficiaries. 29 Video games such as the Dragon Fable may be the proper analogy for the challenge that faces scientists struggling to interpret such research reports, or to discern whether employers should spend more money (by reducing member cost share) to save money. Like the experienced Dragon Fable player, managed care decision makers might indeed feel as if new ideas are constantly being designed to test their bravery and intellect. But to those who have studied VBID closely, maybe the fairy tale of The Emperor's New Clothes 30 suits a little better.

Factor to Watch Question or Issue
Research design rigor 8,9,17 • Unless a randomized control group (ideal) or comparison group is employed, there is little assurance that results are attributable to the cost-sharing intervention.
• Comparison group should be as equivalent to the intervention group as possible (age, gender, industry, baseline cost, baseline medical conditions, and disease severity).
• Statistical controls aren't enough; differences in measured factors may signal differences in unmeasured factors (e.g., organizational culture encouraging cost containment, beneficiary income, benefit management programs such as mandatory mail or step therapy).

Pharmaceutical manufacturer sponsorship 25
• Pharma-sponsorship is usually associated with (1) production of undesirable findings for cost sharing and (2) weak or unusual research designs (e.g., use of comparison groups with markedly different baseline characteristics, study of atypical benefit designs).
Baseline descriptive statistics 28 • Even when a multivariate analysis is presented, baseline descriptive data such as measures of central tendency (mean and median values) and dispersion (e.g., range, interquartile range [IQR], standard deviation), about key outcomes-related factors (e.g., cost, utilization, chronic disease score) are the only way that a reader can tell whether study populations are reasonably comparable to each other and/or to the reader's population of interest.
Effect of sampling process 28 • Clear explication of the number of cases excluded at each stage of a sampling process is a data presentation standard and a requirement of JMCP.
• Sampling procedures should be presented so clearly that a reader with access to the data could replicate the sample.
• Enrollees can often respond to cost-sharing change with enrollment change. Researchers should check for approximately equivalent patterns of enrollment and disenrollment across study groups.

Results presented for absolute and relative values 28,29
• Relative measures, such as odds ratios and hazard ratios, should be translated into absolute terms (e.g., number needed to treat, number needed to harm, change in number of percentage points) to make results practically meaningful and transparent for the reader.
Adequacy measure(s) for multivariate models 17,28 • Percentage of variance explained (e.g., R-squared, pseudo-R-squared) and/or predictive accuracy (e.g., c-statistic or area under the Receiver Operating Characteristics curve) tell the reader how much (if at all) the statistical model has accounted for confounding factors.

Press release matches research design
• Press release is limited to outcomes actually measured by the study, without making promises about effects on unmeasured outcomes.